Goto

Collaborating Authors

 shallow univariate relu network


On the Effective Number of Linear Regions in Shallow Univariate ReLU Networks: Convergence Guarantees and Implicit Bias

Neural Information Processing Systems

We study the dynamics and implicit bias of gradient flow (GF) on univariate ReLU neural networks with a single hidden layer in a binary classification setting. We show that when the labels are determined by the sign of a target network with $r$ neurons, with high probability over the initialization of the network and the sampling of the dataset, GF converges in direction (suitably defined) to a network achieving perfect training accuracy and having at most $\mathcal{O}(r)$ linear regions, implying a generalization bound. Unlike many other results in the literature, under an additional assumption on the distribution of the data, our result holds even for mild over-parameterization, where the width is $\tilde{\mathcal{O}}(r)$ and independent of the sample size.


On the Effective Number of Linear Regions in Shallow Univariate ReLU Networks: Convergence Guarantees and Implicit Bias

Neural Information Processing Systems

We study the dynamics and implicit bias of gradient flow (GF) on univariate ReLU neural networks with a single hidden layer in a binary classification setting. We show that when the labels are determined by the sign of a target network with r neurons, with high probability over the initialization of the network and the sampling of the dataset, GF converges in direction (suitably defined) to a network achieving perfect training accuracy and having at most \mathcal{O}(r) linear regions, implying a generalization bound. Unlike many other results in the literature, under an additional assumption on the distribution of the data, our result holds even for mild over-parameterization, where the width is \tilde{\mathcal{O}}(r) and independent of the sample size.


Reviews: Gradient Dynamics of Shallow Univariate ReLU Networks

Neural Information Processing Systems

In this paper, the authors are using a neural network with one hidden layer and one output layer. The activation function is ReLU, and each hidden unit also has a bias term that can be trained. The input is 1-dimensional, i.e., a scalar, so the task is equivalent to interpolation. The loss function is square loss, and the authors use gradient descent to learn the network weights. The authors use an over-parametrization scheme where the number of nodes in the hidden layer tends to infinity, which means the network is infinitely wide. There are two main learning schemes mentioned in this paper: Kernel and adaptive.


On the Effective Number of Linear Regions in Shallow Univariate ReLU Networks: Convergence Guarantees and Implicit Bias

Neural Information Processing Systems

We study the dynamics and implicit bias of gradient flow (GF) on univariate ReLU neural networks with a single hidden layer in a binary classification setting. We show that when the labels are determined by the sign of a target network with r neurons, with high probability over the initialization of the network and the sampling of the dataset, GF converges in direction (suitably defined) to a network achieving perfect training accuracy and having at most \mathcal{O}(r) linear regions, implying a generalization bound. Unlike many other results in the literature, under an additional assumption on the distribution of the data, our result holds even for mild over-parameterization, where the width is \tilde{\mathcal{O}}(r) and independent of the sample size.